I’ve set up a basic pipeline in GitLab CI to build and publish my [[org-roam]] wiki automatically. Every time I commit some changes to the git repo that I store it in, the pipeline runs my [[org-publish]] bizness and then rsyncs the output html to my server.
I’ve been publishing from my local machine up until now. It’s OK, but it takes a long time to rebuild all of the pages. As such, I tend to just publish files individually as I create them. But sometimes I miss pages, and sometimes I forget to I’m getting to the point where I’d like to have it auto-published after making changes.
Another approach could be to just make the build quickerβ¦ but I’m not sure how to do that just yet.
But generally, I do just like this kind of stuff, setting up continuous integration and things. Your mileage may vary - if you find this ops stuff a bit tedious, the benefits might not really be worth it.
What do I want to happen?
To get a CI pipeline on Gitlab you just need a .gitlab-ci.yml file in the root of the repo.
My .gitlab-ci.yml for the wiki has two stages in its pipeline - one to build the org files, and one to rsync them to my server.
stages:
- build-org
- publish
The second one, the deployment stage, could be swapped out to deploy them to gitlab pages or something similar.
org-generation:
image:
name: silex/emacs
artifacts:
paths:
- _posts
before_script:
- apt-get update
- apt-get --yes --force-yes install sqlite3
- apt-get --yes --force-yes install git-restore-mtime
script:
- git restore-mtime
- emacs -batch -q -l publish.el -f commonplace/publish-gitlab
stage: build-org
I use silex/emacs image for the org-generation
job, which is just Ubuntu with Emacs installed. The job calls a function commonplace/publish-gitlab
in my [[]]. This function configures org-publish-project-alist
. I used to have this configuration happen just globally in publish.el, but now I need it parameterised based on where I’m calling it from (locally, or from CI), to pass in the source / output directories.
I started out using Alpine Linux images for the steps, because (I think?) they pull down quicker given they’re so barebones. But I got to a couple of points where I wasn’t sure how to do a couple of things on Alpine, so I reverted back to Debian/Ubuntu for now. Maybe I’ll go back to Alpine at a later date to speed up the build a bit.
I found setting up ssh capabilities a bit of a faff in GitLab CI. Maybe it’s just the first time you do it, probably easier in future now I’m used to it.
rsync:
image: debian
before_script:
- apt-get update
- apt-get --yes --force-yes install rsync
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan commonplace.doubleloop.net >> ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
- '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'
script:
- rsync -chavz _posts/ 37.218.246.201:/var/www/commonplace/
stage: publish
I still have a problem with host key checking that I need to sort out better.
- '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'
This solution is a hack.
I use org-publish’s :sitemap-sort-files
set to anti-chronologically
to produce a [[Recent Changes]] file. This uses the org files’ mtimes in order to figure out when they last change. When you clone a git repo, you lose these.
So I added in some steps to call git-restore-mtime to set the files mtimes based on their last git commits.
By default, the GitLab runner only pulls in the last 50 commits of your repo. Which means restore-mtime doesn’t see all of the logs. I changed this with:
variables:
GIT_DEPTH: 0
in the yaml. I don’t think it’s ideal, as doing a shallow clone is a good idea for performance. To be revisited.
I have a function that is a preprocessor of org-export, that generates backlinks and inserts them in to the pages. It was working locally, but not remotely.
In order for backlinks to get generated correctly again, I needed to force org-roam’s cache to be rebuilt on the CI server.
I added:
(org-roam-db-build-cache t)
To commonplace/publish-gitlab
. The argument forces a clearout of the cache before rebuilding.
I also changed things to ensure I am using the absolute path of the project on the CI server. I’m not sure if this is necessary or not, but I thought maybe org-roam’s DB query was expecting absolute paths, so I chucked it in.
(commonplace/configure (file-truename ".") "posts")
Unfortunately the cache rebuild slows the pipeline down a fair bit.
Build speed continues to be a massive issue. org-export/org-publish itself is really slow. Add in the org-roam cache rebuild, and there’s a lot going on.
Around when I first introduced the pipeline, it took about 6 minutes - way too slow. Currently, as of Feb 2022 with over 3000 org files, it’s taking around 30 minutes!!! This is horrendous. [[Speeding up org-export and org-publish]].
Maybe by switching back to Alpine again, I might trim some time off. And using caching between stages. And if I really cared, I could set up my own Docker images that have all the bits needed already added.
One nice thing is that having it timed in Gitlab means I can get a bit of a grasp of where tweaks can be made.
git-restore-mtime
https://github.com/MestreLion/git-tools/Seems marginally quicker with the Debian based image rather than Ubuntu for the rsync step. Only by about 10 seconds though.
No error messages to note in the CI output. I see that locally I am using a slightly older version of org-roam - maybe it is actually just a breaking change in the new one?
No, still seems to work fine locally.
Ah, it seems links are stored in the roam db as absolute paths.
Might need to regenerate roam db on CI?
That seems to have fixed it.
Why? I’m getting to the point where I’d like to have it auto-published after making changes.
Setup a simple CI/CD using Gitlab Pipeline | by Muhammad Ndako | Medium
set up an ssh key just for gitlab (make sure no passphrase)
Issue with Host Key verification failed.
ci tip: remove the build stage temporarily as that takes ages, just focus on the deploy stage while resolving that
some kind of problem with recent changes page when publishing via gitlab ci, they all have today’s date
The pipeline has become horribly slow lately. Not sure what changed, but now it is failing to publish at all because it is timing out on one of the steps.
It seems perhaps to be bailing on tracks.org
which is a massive bullet list. Is that the problem? It didn’t used to be. What changed?
Rendering context...